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Abstract — In this work, we study the problem of evaluating 
the performance limit of two communication problems that are 
closely related to each other- source coding with feed-forward 
and channel coding with feedback. The formulas (involving 
directed information) for the optimal rate-distortion function 
with feed-forward and channel capacity with feedback are multi- 
letter expressions and cannot be computed easily in general. 
In this work, we derive conditions under which these can be 
computed for a large class of sources/channels with memory and 
distortion/cost measures. Illustrative examples are also provided. 

I. Introduction 

Feedback is widely used in communication systems to 
help combat the effect of noisy channels. It is well-known 
that feedback does not increase the capacity of a discrete 
memoryless channel [1]. However, feedback could increase 
the capacity of a channel with memory. Recently, directed 
information has been used to elegantly characterize the ca- 
pacity of channels with feedback [2], [3], [4], [5]. The source 
coding counterpart of channel coding with feedback is source 
coding with feed-forward. Channels with feedback have been 
studied extensively, but the problem of source coding with 
feed-forward is recent [6], [7], [8], [9]. 

Source coding with feed-forward can be explained in simple 
terms as follows. In the usual fixed-rate lossy source coding 
problem, there is a source X that has to be reconstructed at a 
decoder with some distortion D. The encoder takes a block of, 
say, N source samples and maps it to an index in a codebook. 
The decoder uses this index to generate the reconstruction of 
the N source samples. In source coding with feed-forward, 
the encoder works in a similar fashion and sends an index 
to the decoder. The decoder generates the reconstructions 
sequentially: in order to reconstruct each source sample, the 
decoder has access to the index as well as some past source 
samples. More precisely, let X n ,X n denote the source and 
reconstruction samples at time n, respectively. If the source 
samples are available with a delay k after the index is sent, 
to generate X n , the decoder has knowledge of the index plus 
the source samples until time n — k. This problem is called 
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feed-forward with delay k, and it is of interest to study the 
rate vs. distortion trade-offs in this setting [6], [9]. 

Source coding with feed-forward was considered in the 
context of competitive prediction in [6], The problem was 
motivated and studied in [7], [8], [9] from a communications 
perspective, as a variant of source coding with side informa- 
tion. For instance, we can consider the source to be a field 
that needs to compressed and communicated from one node 
to another in a network. This field (e.g. a seismic or acoustic 
field) could propagate through the medium at a slow rate and 
become available at the decoding node as side-information 
with some delay. Later in this paper, we will present an 
example of feed-forward relating to predicting variations in 
stock prices. 

The formulas (involving directed information) for the opti- 
mal rate-distortion function with feed-forward [9] and channel 
capacity with feedback [4] are multi-letter expressions and 
cannot be computed easily in general. In this work, we study 
the problem of evaluating the rate-distortion and capacity 
expressions. We derive conditions under which these can be 
computed for a large class of sources (channels) with memory 
and distortion (cost) measures. We also provide illustrative 
examples. Throughout, we consider source feed-forward and 
channel feedback with arbitrary delay. When the delay goes 
to oo, we obtain the case of no feed-forward/feedback. 

II. Source Coding with Feed-Forward 

A. Problem Formulation 

Consider a general discrete source X with alphabet X, 
characterized by a sequence of distributions denoted Px = 
{Px n }^Li- The reconstruction alphabet is X and there is an 
associated sequence of distortion measures d n : X n x X n — > 
K + . It is assumed that d n (x n ,x n ) is normalized with respect 
to n and is uniformly bounded in n. For example d n (x n ,x n ) 
may be the average per-letter distortion, i.e., A 53i=i d{ x i,Xi) 
for some d : X x X -> M+. 

Definition 1: An (N, 2 NR ) source code with delay k feed- 
forward of block length N and rate i? consists of an encoder 
mapping e and a sequence of decoder mappings <?;,?' = 



, N, where 



where 



e:X N 



{1, 



tNR 



} 



S ,:{l,..,nxr^J, i = l,...,N. 
The encoder maps each TV-length source sequence to an index 
in {1, . . . ,2^^}. The decoder receives the index transmitted 
by the encoder, and to reconstruct the ith sample (i > k), it 
has access to the source samples until time (i — k) (for i < k, 
Xi is produced using the index alone). We want to minimize 
R for a given distortion constraint. 

Definition 2: (Probability of error criterion) R is an e- 
achievable rate at distortion D if for all sufficiently large N, 
there exists an (N, 2 NR ) source codebook such that 



P 



■ x n [X 



N . 



n ~N 



)>D)<e, 



where x N denotes the reconstruction of x N . R is an achievable 
rate at probability- 1 distortion D if it is e-achievable for every 
e > 0. 

We now give a brief summary of the rate-distortion results 
with feed-forward found in [9]. The rate-distortion function 
with feed-forward (delay 1) is characterized by directed in- 
formation, a quantity defined in [2]. The directed information 
flowing from a random sequence to a random sequence 
X N is defined as 



A' 



i(x N x N ) = J (^ n ; x^x 11 - 1 ). 



(i) 



When the feed-forward delay is k, the rate-distortion func- 
tion is characterized by the k— delay version of the directed 
information: 



N 



I k (X N ^X N ) = J2 I(X n+k - 1 ;X n \X n - 1 ). (2) 



n = l 



When we do not make any assumption on the nature of the 
joint process {X, X}, we need to use the information spectrum 
[10] version of (01. In particular, we will need the quantitjQ 



- 1 P Y n y-n 

I k (X -» X) = limsup - log — ^— — 

inprob ^1 P^„ , ' P] 



(3) 



x n |x r - 



A"'' 



where 



pfc 

X"|X" 



It should be noted that (fJI and (f3]) are the same when the joint 
process {X,X} is stationary and ergodic. 

Theorem 1: [9] For an arbitrary source X characterized 
by a distribution Px, the rate-distortion function with feed- 
forward- the infimum of all achievable rates at distortion D- 
is given by 

R ff( D ) = p ?S f , <n lk{± ^ X ^ (4) 



p(Px|x) = limsupdn(a;"',a;"') 

inprob 

— inf \ h : lim P x „ ^„ ((x n ,x n ) : d n (x n ,x n ) > h) — > 



(5) 



B. Evaluating the Rate-Distortion Function with Feed-forward 

The rate-distortion formula in Theorem Q] is an optimization 
of a multi-letter expression: 



Ik(X — > X) = limsup — log , 

inprob P 



P 



X"|X r ' 



p 



x< 



This is an optimization over an infinite dimensional space 
of conditional distributions P X | X - Since this is a potentially 
difficult optimization, we turn the problem on its head and 
pose the following question: 

Given a source X with distribution Px and a conditional 
distribution P X |X' f or wnat sequence of distortion measures 
does P X |x acn ' eve tne infimum in the rate-distortion formula 
? 

A similar approach is used in [11] (Problem 2 and 3, p. 
147) to find optimizing distributions for discrete memoryless 
channels and sources without feedback/feed-forward. It is also 
used in [12] to study the optimality of transmitting uncoded 
source data over channels and in [13] to study the duality 
between source and channel coding. 

Given a source X, suppose we have a hunch about the 
structure of the optimal conditional distribution. The following 
theorem (proof omitted) provides the distortion measures for 
which our hunch is correct. 

Theorem 2: Suppose we are given a stationary, ergodic 
source X characterized by Px = {Px n }^Li with feed- 
forward delay k. Let P X |x = {-Px™|x™}J?Li be a conditional 
distribution such that the joint distribution is stationary and 
ergodic. Then P X |x achieves the rate-distortion function if 
for all sufficiently large n, the distortion measure satisfies 

1 P xn x„(x n ,x n ) 
d n (x n ,x n ) = -c--\og I ' x )^ - +d (x n ), (6) 
n Ph (x n \x n ) 

where P|„ |xn (£«|x") = U7=x P^i-^-itolz*-*,** -1 ), 
c is any positive number and do(-) is an arbitrary func- 
tion. The distortion constraint in this case is equal to 
limsup^^ d n (x n , x n ). 

We have considered a conditional distribution P X |x suc h 
that PxP X |x = {Px n Px n \x n }n°=i ls stationary, ergodic. 
Nevertheless, the theorem gives the condition for optimality 
of P X |x amon g all conditional distributions, not just the ones 
that make the joint distribution stationary and ergodic. 

C. Markov Sources with Feed-forward 

A stationary, ergodic mth order Markov source X is char- 
acterized by a distribution Px = {Px n }'^' =1 where 



'The lim sup inprob of a random sequence A„ is defined as the smallest 
number a such that lim n ^oo P(A n > a) = and is denoted A. 



P ^-U P x t \xlz^ 



Vn. 



(7) 
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Fig. 1. Markov chain representing the stock value 



Let the source have feed-forward with delay k. We first ask: 
When is the optimal joint distribution also mth order Markov 
in the following sense: 



IP 



Vn. 



(8) 



In other words, when does the optimizing conditional distri- 
bution have the form 



x n \x n 



\\''x v Vn. 



(9) 



The answer, provided by Theorem [2 is stated below. We drop 
the subscripts on the probabilities to keep the notation clean. 

Corollary 1: For an mth order Markov source (described 
in (0) with feed-forward delay k, an mth order conditional 
distribution (described in (O) achieves the optimum in the 
rate-distortion function for a sequence of distortion measures 
{d n } given by 



d n {x n ,x n ) 



1 - 

71 ' 



P(Xi\x 



i-k+1: ^i-k+l-m) 



+ d (x n ), 
(10) 

where c is any positive number and d (.) is an arbitrary 
function. 

Proof: The proof involves substituting (0 and (0 in © 
and performing a few manipulations. 

III. Examples 

A. Stock-market example 

Suppose that we wish to observe the behavior of a particular 
stock in the stock market over an iV— day period. Assume that 
the value of the stock can take k + 1 different values and is 
modeled as a k + 1-state Markov chain, as shown in Fig. Q] 
If on a particular day, the stock is in state i, 1 < i < k, then 
on the next day, one of the following can happen. 

• The value increases to state i + 1 with probability pi. 

• The value drops to state i — 1 with probability qi. 

• The value remains the same with probability 1 — pi — qi. 
When the stock-value is in state 0, the value cannot decrease. 
Similarly, when in state k, the value cannot increase. Suppose 
an investor invests in this stock over an N— day period and 
desires to be forewarned whenever the value drops. Assume 
that there is an insider (with some a priori information about 
the behavior of the stock over the N days) who can send 
information to the investor at a finite rate. 



The value of the stock is modeled as a Markov source X = 
{X„}. The decision X n of the investor is binary: X n = 1 
indicates that the price is going to drop from day n — 1 to n, 
X n = means otherwise. Before day n, the investor knows 
all the previous values of the stock X n ~ l and has to make 
the decision X n . Thus feed-forward is automatically built into 
the problem. 

The investor makes an error either when she fails to predict 
a drop or when she falsely predicts a drop. The distortion is 
modeled using a Hamming distortion criterion as follows. 



1 



E 

i=l 



(ii) 



where e(., ., .) is the per-letter distortion given Table Q] The 
minimum amount of information (in bits/sample) the insider 
needs to convey to the investor so that she can predict drops 
in value with distortion D is denoted Rff(D). 

Proposition 1: For the stock-market problem described 
above, 

k-l 

R ff( D ) = E 71 " 4 ( h (Pi> 1 ~Pi- H) - h ( e > 1 - e )) 
+ TTfe (h(qic, 1 - qk) - h(e, 1 - e)) , 



i=i 



where h() is the entropy function, [tto,tti,--- , 7Tfe] is the 
stationary distribution of the Markov chain and e = 1 D v . 

Proof: We will use Corollary [T] to verify that a first-order 
Markov conditional distribution of the form 



r X n \X Tl ,X n 



Vn 



(12) 



achieves the optimum. 

Due to the structure of the distortion function in Table U 
we choose the structure of P{xi\xi,Xi-\) as follows. When 
= 0, the decoder can always declare Xi = - there is 
no error irrespective of the value of X,*. So we assign P(Xi — 
0\xi-! = 0,Xi = 0) = P(Xi = 0\ Xi -i =Q,Xi = l) = 1, 
which gives P(Xj = 0|o;,_i = 0, Xi = 0) = 1 — p. The event 
= Q,Xi — 1) has zero probability. When (X;_i = 
j,Xi — 0), 1 < j < k, an error occurs when JQ = j — 1, 
This is assigned a probability e. The remaining probability 
1 — e is split between P(Xj = j\xi-i = = 0) and 

P(Xi = j + = j, Xi = 0) according to their transition 

probabilities. In a similar fashion, we obtain all the columns 
in Table M 
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The conditional distribution P(Xi a:*) 
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We can show that the distortion criterion ( fTTT i can be cast We can now calculate the rate distortion function as 



in the form 

1 ™ 

d n (x n ,x n ) = - (-clag 2 P(xi\xi,Xi-i)+do(xi-i,Xi) 

i—l 

(13) 

or equivalently 

e(x i} Xi-i,Xi) = -clog 2 P(xi\x i ,x i - 1 ) +d (xi-i,Xi), 



p(x 1 ,x 2 ,x 2 )iog : 



to obtain the expression in Proposition Q] 
B. Gauss-Markov Source 



P{x 2 \x ll x 2 ) (18) 
P(x 2 \x 1 ) 



(14) Consider a stationary, ergodic, first-order Gauss-Markov 
thereby proving that the distribution in Table [TT] is optimal, source X with mean 0, correlation p and variance a 2 : 



This is done by determining the values of c, do(xi-i, Xi), 1 < 
Xi-i,Xi < k. Using the values from Tables HI and ITU in (fT4t . 
we can find c, do(., .). 



X n = pX n _ 1 +N n , Vn, (19) 
where {N n } are independent, identically distributed Gaussian 



Since the process {X, X} is jointly stationary and ergodic, random val - ia bles with mean and variance (1 - p 2 )a 2 . 
the distortion constraint is equivalent to E[e(x 2 , Xl ,x 2 )] < D. Suppose that the source has feed-forward with delay 1 and 
To calculate the expected distortion we wan j j reconstruct at every time instant n the linear 

combination aX n + bX n -±, for any constants a, b. We use 
E[e(x 2 ,xi,x 2 )] = }2 P(xi,x 2 )P(x 2 \xi,X2)-e(x2, x t ,x 2 ), the mean-squared error distortion criterion: 



(15) 

we need the (optimum achieving) conditional distribution 
P(X 2 \xi,x 2 ). This is found by substituting the values from 
Table HI] in the relation 



1 

d n (x n ,x n ) = - (oti - (axi + bxi-i)) 2 

n — * 



(20) 



P(x 2 \xi,x 2 ) 



P{x 2 \xi)P(x 2 \x 2 ,Xi) 

J2x 2 P ( x 2\xi)P{x 2 \x 2 ,Xi)' 



(16) 



Thus we obtain the conditional distribution P(X 2 \x\,x 2 ) 
shown in Table [Til] Using this in ( fl5l >, we get 



The feed-forward distortion-rate function for this source with 
average mean-squared error distortion was given in [6]. The 
feed-forward rate-distortion function can also be obtained 
using Theorem [2] as (proof omitted) 

* 2 {l-p 2 ) 



1 



Rff(D) = 5 log- D/a2 



(21) 



E[e(x 2 ,xi,x 2 )} = (1 - 7r )e < D 



We must mention here that the rate-distortion function in the 
(17) first example cannot be computed using the techniques in [6], 



IV. Channel Coding with Feedback 

In this section, we consider channels with feedback and the 
problem of evaluating their capacity. A channel is defined as 
a sequence of probability distributions: 



and 



pch 
-'YIX 



{pch 



Yr 



(22) 



In the above, X n and Y n are the channel input and output sym- 
bols at time n, respectively. The channel is assumed to have 
k— delay feedback (1 < k < oc). This means at time instant 
n, the encoder has perfect knowledge of the channel outputs 
until time n — k to produce the input x n . The input distribution 
to the channel is denoted by P^ Y = {■Px'„|x»- 1 ,Y'»-*}^!Li- 
In the sequel, we will need the following product quantities 
corresponding to the channel and the input. 



i=l 



XiVi-l 



r X n \Y n 



X i \X i - 1 ,Y i ~ k - 



(23) 



The joint distribution of the system is given by Px.y = 
{Pxnyo}^!, where P X n, Y n = ? xn]yn ■ P Y h n]xn , 

Definition 3: An (N, 2 NR ) channel code with delay k feed- 
forward of block length N and rate R consists of a sequence 
of encoder mappings ej, i = 1, . . . , N and a decoder g, where 

,2* B } x y l - k 



g-y 



N 



x, 



= l, 



} 



Thus it is desired to transmit one of 2 NR messages over the 
channel in N units of time. There is an associated cost function 
for using the channel given by c N (X N ,Y N ). For example, 
this could be the average power of the input symbols. Note 
that in general, we have allowed the cost function at time N 
to depend on the inputs and the outputs until time N. This 
is because the encoder knows the outputs (with some delay) 
due to the feedback, and can potentially use this information 
to choose future input symbols to satisfy the cost constraint. 

If W is the message that was transmitted, then the proba- 
bility of error is P e = Pr(g(Y N ) ^ W). 

Definition 4: R is an (e, S) -achievable rate at cost C if for 
all sufficiently large N, there exists an (N, 2 NR ) channel code 
such that 

P e <e and Pr(c N (X N , Y N ) > C) < 6. 

R is an achievable rate at cost C if it is (e, i5)-achievable for 
every e, 8 > 0. 

Theorem 3: [5] For an arbitrary channel -P^x ' t ^ le ca P ac_ 
ity with fc— delay feedback, the infimum of all achievable rates 
at cost C, is given by H 

C fb (C)= sup L(X^Y), (24) 



where 



pch 

I(X — ► F) = lim inf - log Y " lx ' 

inprob Tl Py-n 



The lim inf inpro b of a random sequence A n is defined as the largest 
number a such that lirrin^oo P(A n < a) = and is denoted A. 



= limsupc n (X n , Y n ) 

inprob 

= inf {ft : lim Px»y» ((x n ,y") : c n (x",y n ) > h)} = 0. 

n — *-oo 

In the above, we note that 

P Y ~ - £ PX«,Y« = £ P XK\Yn ■ Pfr\ X n. 

X" X" 

A. Evaluating the Channel Capacity with Feedback 

The capacity formula in Theorem [3] is a multi-letter ex- 
pression involving optimizing the function I_(X — * Y) over 
an infinite dimensional space of input distributions P^ Y - 
Just like we did with sources, we can pose the following 
question: Given a channel P Y |X an d an ^ n P ut distribution 
^XIY' f or w hat sequence of cost measures does P^ Y achieve 
the supremum in the capacity formula ? 

The following theorem (proof omitted) provides an answer. 

Theorem 4: Suppose we are given a channel -P^x w ^ tn 
k— delay feedback and an input distribution P x i Y sucn mat 
the joint process Px,y is stationary, ergodic. Then the input 
distribution P^ Y achieves the A:— delay feedback capacity of 
the channel if for all sufficiently large n, the cost measure 
satisfies 



c n (x n ,y n ) = X--log- 
n 



pch 



Y n\X: 



(y n \x n ) 



do, 



(25) 



iV»(i/ n ) 

where A is any positive number and do is an arbitrary 
constant. The cost constraint in this case is equal to 
limsup„^ 00 c„(a; ,l ,y"). 
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